Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 500 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.4 MiB |
| Average record size in memory | 2.8 KiB |
Variable types
| Numeric | 7 |
|---|---|
| Categorical | 1 |
original_title has a high cardinality: 500 distinct values | High cardinality |
popularity is highly correlated with budget and 3 other fields | High correlation |
revenue is highly correlated with budget and 2 other fields | High correlation |
vote_average is highly correlated with popularity and 1 other fields | High correlation |
vote_count is highly correlated with popularity and 2 other fields | High correlation |
df_index is highly correlated with release_year | High correlation |
budget is highly correlated with popularity and 1 other fields | High correlation |
release_year is highly correlated with df_index | High correlation |
original_title is uniformly distributed | Uniform |
df_index has unique values | Unique |
original_title has unique values | Unique |
popularity has unique values | Unique |
revenue has unique values | Unique |
Reproduction
| Analysis started | 2022-10-19 09:54:12.476205 |
|---|---|
| Analysis finished | 2022-10-19 09:54:21.565601 |
| Duration | 9.09 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2030.428 |
| Minimum | 0 |
|---|---|
| Maximum | 14383 |
| Zeros | 1 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 119.4 |
| Q1 | 649 |
| median | 1639.5 |
| Q3 | 2646.25 |
| 95-th percentile | 6663.5 |
| Maximum | 14383 |
| Range | 14383 |
| Interquartile range (IQR) | 1997.25 |
Descriptive statistics
| Standard deviation | 1935.169859 |
|---|---|
| Coefficient of variation (CV) | 0.9530846987 |
| Kurtosis | 8.166330218 |
| Mean | 2030.428 |
| Median Absolute Deviation (MAD) | 999 |
| Skewness | 2.33464605 |
| Sum | 1015214 |
| Variance | 3744882.382 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1639 | 1 | 0.2% |
| 2015 | 1 | 0.2% |
| 3677 | 1 | 0.2% |
| 2223 | 1 | 0.2% |
| 3724 | 1 | 0.2% |
| 6648 | 1 | 0.2% |
| 7075 | 1 | 0.2% |
| 1774 | 1 | 0.2% |
| 1537 | 1 | 0.2% |
| 2241 | 1 | 0.2% |
| Other values (490) | 490 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 18 | 1 |
| Value | Count | Frequency (%) |
| 14383 | 1 | |
| 14045 | 1 | |
| 10098 | 1 | |
| 9643 | 1 | |
| 9118 | 1 | |
| 8943 | 1 | |
| 8278 | 1 | |
| 8178 | 1 | |
| 7993 | 1 | |
| 7859 | 1 |
| Distinct | 80 |
|---|---|
| Distinct (%) | 16.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 47689593.1 |
| Minimum | 19000000 |
|---|---|
| Maximum | 200000000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.5 KiB |
Quantile statistics
| Minimum | 19000000 |
|---|---|
| 5-th percentile | 20000000 |
| Q1 | 27000000 |
| median | 40000000 |
| Q3 | 60000000 |
| 95-th percentile | 95150000 |
| Maximum | 200000000 |
| Range | 181000000 |
| Interquartile range (IQR) | 33000000 |
Descriptive statistics
| Standard deviation | 26749694.59 |
|---|---|
| Coefficient of variation (CV) | 0.5609126194 |
| Kurtosis | 4.718904466 |
| Mean | 47689593.1 |
| Median Absolute Deviation (MAD) | 15000000 |
| Skewness | 1.78498463 |
| Sum | 2.384479655 × 1010 |
| Variance | 7.155461606 × 1014 |
| Monotonicity | Decreasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 25000000 | 36 | 7.2% |
| 30000000 | 31 | 6.2% |
| 50000000 | 28 | 5.6% |
| 40000000 | 27 | 5.4% |
| 20000000 | 25 | 5.0% |
| 60000000 | 23 | 4.6% |
| 35000000 | 22 | 4.4% |
| 45000000 | 18 | 3.6% |
| 70000000 | 17 | 3.4% |
| 80000000 | 15 | 3.0% |
| Other values (70) | 258 |
| Value | Count | Frequency (%) |
| 19000000 | 5 | 1.0% |
| 19885552 | 1 | 0.2% |
| 20000000 | 25 | |
| 21000000 | 4 | 0.8% |
| 22000000 | 12 | 2.4% |
| 23000000 | 12 | 2.4% |
| 24000000 | 9 | 1.8% |
| 25000000 | 36 | |
| 25530000 | 1 | 0.2% |
| 26000000 | 8 | 1.6% |
| Value | Count | Frequency (%) |
| 200000000 | 1 | |
| 175000000 | 1 | |
| 170000000 | 1 | |
| 160000000 | 1 | |
| 150000000 | 1 | |
| 140000000 | 2 | |
| 135000000 | 1 | |
| 133000000 | 1 | |
| 130000000 | 1 | |
| 125000000 | 1 |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.3 MiB |
| Jurassic Park | 1 |
|---|---|
| Brokedown Palace | 1 |
| Space Truckers | 1 |
| Ghost | 1 |
| The Rock | 1 |
| Other values (495) |
Length
| Max length | 55 |
|---|---|
| Median length | 35 |
| Mean length | 14.384 |
| Min length | 3 |
Characters and Unicode
| Total characters | 7192 |
|---|---|
| Distinct characters | 102 |
| Distinct categories | 9 ? |
| Distinct scripts | 5 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 500 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | Titanic |
|---|---|
| 2nd row | Waterworld |
| 3rd row | Wild Wild West |
| 4th row | The 13th Warrior |
| 5th row | Tarzan |
Common Values
| Value | Count | Frequency (%) |
| Jurassic Park | 1 | 0.2% |
| Brokedown Palace | 1 | 0.2% |
| Space Truckers | 1 | 0.2% |
| Ghost | 1 | 0.2% |
| The Rock | 1 | 0.2% |
| Another 48 Hrs. | 1 | 0.2% |
| 8MM | 1 | 0.2% |
| The Long Kiss Goodnight | 1 | 0.2% |
| The Hunchback of Notre Dame | 1 | 0.2% |
| How Stella Got Her Groove Back | 1 | 0.2% |
| Other values (490) | 490 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| the | 156 | 12.1% |
| of | 36 | 2.8% |
| in | 17 | 1.3% |
| a | 15 | 1.2% |
| 2 | 10 | 0.8% |
| and | 10 | 0.8% |
| man | 9 | 0.7% |
| i | 6 | 0.5% |
| city | 6 | 0.5% |
| world | 6 | 0.5% |
| Other values (808) | 1018 |
Most occurring characters
| Value | Count | Frequency (%) |
| 789 | 11.0% | |
| e | 759 | 10.6% |
| a | 458 | 6.4% |
| r | 408 | 5.7% |
| n | 388 | 5.4% |
| o | 383 | 5.3% |
| t | 382 | 5.3% |
| i | 372 | 5.2% |
| s | 285 | 4.0% |
| h | 274 | 3.8% |
| Other values (92) | 2694 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5080 | |
| Uppercase Letter | 1158 | 16.1% |
| Space Separator | 789 | 11.0% |
| Other Punctuation | 71 | 1.0% |
| Other Letter | 45 | 0.6% |
| Decimal Number | 40 | 0.6% |
| Dash Punctuation | 3 | < 0.1% |
| Modifier Letter | 3 | < 0.1% |
| Other Number | 3 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 759 | |
| a | 458 | |
| r | 408 | 8.0% |
| n | 388 | 7.6% |
| o | 383 | 7.5% |
| t | 382 | 7.5% |
| i | 372 | 7.3% |
| s | 285 | 5.6% |
| h | 274 | 5.4% |
| l | 235 | 4.6% |
| Other values (17) | 1136 |
Other Letter
| Value | Count | Frequency (%) |
| の | 4 | 8.9% |
| ポ | 3 | 6.7% |
| ケ | 3 | 6.7% |
| モ | 3 | 6.7% |
| ン | 3 | 6.7% |
| ト | 2 | 4.4% |
| 劇 | 2 | 4.4% |
| ッ | 2 | 4.4% |
| ス | 2 | 4.4% |
| タ | 2 | 4.4% |
| Other values (17) | 19 |
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 178 | |
| S | 104 | 9.0% |
| M | 82 | 7.1% |
| A | 69 | 6.0% |
| B | 68 | 5.9% |
| D | 66 | 5.7% |
| P | 60 | 5.2% |
| C | 59 | 5.1% |
| F | 54 | 4.7% |
| I | 53 | 4.6% |
| Other values (16) | 365 |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 30 | |
| ' | 17 | |
| . | 16 | |
| ! | 3 | 4.2% |
| & | 2 | 2.8% |
| ? | 1 | 1.4% |
| / | 1 | 1.4% |
| , | 1 | 1.4% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 14 | |
| 0 | 7 | |
| 3 | 7 | |
| 1 | 5 | 12.5% |
| 4 | 3 | 7.5% |
| 8 | 2 | 5.0% |
| 9 | 1 | 2.5% |
| 7 | 1 | 2.5% |
Other Number
| Value | Count | Frequency (%) |
| ³ | 1 | |
| ⅓ | 1 | |
| ½ | 1 |
Space Separator
| Value | Count | Frequency (%) |
| 789 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 3 |
Modifier Letter
| Value | Count | Frequency (%) |
| ー | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6238 | |
| Common | 909 | 12.6% |
| Katakana | 27 | 0.4% |
| Han | 12 | 0.2% |
| Hiragana | 6 | 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 759 | 12.2% |
| a | 458 | 7.3% |
| r | 408 | 6.5% |
| n | 388 | 6.2% |
| o | 383 | 6.1% |
| t | 382 | 6.1% |
| i | 372 | 6.0% |
| s | 285 | 4.6% |
| h | 274 | 4.4% |
| l | 235 | 3.8% |
| Other values (43) | 2294 |
Common
| Value | Count | Frequency (%) |
| 789 | ||
| : | 30 | 3.3% |
| ' | 17 | 1.9% |
| . | 16 | 1.8% |
| 2 | 14 | 1.5% |
| 0 | 7 | 0.8% |
| 3 | 7 | 0.8% |
| 1 | 5 | 0.6% |
| ! | 3 | 0.3% |
| - | 3 | 0.3% |
| Other values (12) | 18 | 2.0% |
Katakana
| Value | Count | Frequency (%) |
| ポ | 3 | |
| ケ | 3 | |
| モ | 3 | |
| ン | 3 | |
| ト | 2 | 7.4% |
| ッ | 2 | 7.4% |
| ス | 2 | 7.4% |
| タ | 2 | 7.4% |
| ミ | 1 | 3.7% |
| ュ | 1 | 3.7% |
| Other values (5) | 5 |
Han
| Value | Count | Frequency (%) |
| 劇 | 2 | |
| 版 | 2 | |
| 場 | 2 | |
| 逆 | 1 | |
| 襲 | 1 | |
| 誕 | 1 | |
| 爆 | 1 | |
| 幻 | 1 | |
| 姫 | 1 |
Hiragana
| Value | Count | Frequency (%) |
| の | 4 | |
| け | 1 | 16.7% |
| も | 1 | 16.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7140 | |
| Katakana | 30 | 0.4% |
| CJK | 12 | 0.2% |
| Hiragana | 6 | 0.1% |
| None | 3 | < 0.1% |
| Number Forms | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 789 | 11.1% | |
| e | 759 | 10.6% |
| a | 458 | 6.4% |
| r | 408 | 5.7% |
| n | 388 | 5.4% |
| o | 383 | 5.4% |
| t | 382 | 5.4% |
| i | 372 | 5.2% |
| s | 285 | 4.0% |
| h | 274 | 3.8% |
| Other values (60) | 2642 |
Hiragana
| Value | Count | Frequency (%) |
| の | 4 | |
| け | 1 | 16.7% |
| も | 1 | 16.7% |
Katakana
| Value | Count | Frequency (%) |
| ポ | 3 | |
| ケ | 3 | |
| モ | 3 | |
| ン | 3 | |
| ー | 3 | |
| ト | 2 | 6.7% |
| ッ | 2 | 6.7% |
| ス | 2 | 6.7% |
| タ | 2 | 6.7% |
| ミ | 1 | 3.3% |
| Other values (6) | 6 |
CJK
| Value | Count | Frequency (%) |
| 劇 | 2 | |
| 版 | 2 | |
| 場 | 2 | |
| 逆 | 1 | |
| 襲 | 1 | |
| 誕 | 1 | |
| 爆 | 1 | |
| 幻 | 1 | |
| 姫 | 1 |
None
| Value | Count | Frequency (%) |
| ³ | 1 | |
| ½ | 1 | |
| è | 1 |
Number Forms
| Value | Count | Frequency (%) |
| ⅓ | 1 |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.50606286 |
| Minimum | 0.788123 |
|---|---|
| Maximum | 63.869599 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.0 KiB |
Quantile statistics
| Minimum | 0.788123 |
|---|---|
| 5-th percentile | 3.4057613 |
| Q1 | 7.1689905 |
| median | 9.9880715 |
| Q3 | 12.87288425 |
| 95-th percentile | 17.93653895 |
| Maximum | 63.869599 |
| Range | 63.081476 |
| Interquartile range (IQR) | 5.70389375 |
Descriptive statistics
| Standard deviation | 5.911610257 |
|---|---|
| Coefficient of variation (CV) | 0.5626855976 |
| Kurtosis | 22.7435647 |
| Mean | 10.50606286 |
| Median Absolute Deviation (MAD) | 2.8636245 |
| Skewness | 3.415252715 |
| Sum | 5253.031428 |
| Variance | 34.94713583 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 26.88907 | 1 | 0.2% |
| 12.962525 | 1 | 0.2% |
| 7.554848 | 1 | 0.2% |
| 7.566321 | 1 | 0.2% |
| 3.034244 | 1 | 0.2% |
| 5.516316 | 1 | 0.2% |
| 11.781895 | 1 | 0.2% |
| 3.385342 | 1 | 0.2% |
| 15.91126 | 1 | 0.2% |
| 5.154373 | 1 | 0.2% |
| Other values (490) | 490 |
| Value | Count | Frequency (%) |
| 0.788123 | 1 | |
| 0.987196 | 1 | |
| 1.408176 | 1 | |
| 1.466461 | 1 | |
| 1.562471 | 1 | |
| 1.690768 | 1 | |
| 1.914881 | 1 | |
| 2.151436 | 1 | |
| 2.263584 | 1 | |
| 2.304986 | 1 |
| Value | Count | Frequency (%) |
| 63.869599 | 1 | |
| 51.645403 | 1 | |
| 48.307194 | 1 | |
| 41.725123 | 1 | |
| 39.39497 | 1 | |
| 33.366332 | 1 | |
| 26.88907 | 1 | |
| 24.30526 | 1 | |
| 23.984065 | 1 | |
| 23.63659 | 1 |
| Distinct | 10 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1995.418 |
| Minimum | 1990 |
|---|---|
| Maximum | 1999 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.5 KiB |
Quantile statistics
| Minimum | 1990 |
|---|---|
| 5-th percentile | 1990 |
| Q1 | 1993 |
| median | 1996 |
| Q3 | 1998 |
| 95-th percentile | 1999 |
| Maximum | 1999 |
| Range | 9 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 2.783301913 |
|---|---|
| Coefficient of variation (CV) | 0.00139484655 |
| Kurtosis | -0.9469485018 |
| Mean | 1995.418 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.4463361136 |
| Sum | 997709 |
| Variance | 7.746769539 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) |
| 1999 | 76 | |
| 1997 | 70 | |
| 1998 | 68 | |
| 1996 | 60 | |
| 1995 | 51 | |
| 1994 | 43 | |
| 1993 | 36 | |
| 1991 | 34 | |
| 1992 | 31 | |
| 1990 | 31 |
| Value | Count | Frequency (%) |
| 1990 | 31 | |
| 1991 | 34 | |
| 1992 | 31 | |
| 1993 | 36 | |
| 1994 | 43 | |
| 1995 | 51 | |
| 1996 | 60 | |
| 1997 | 70 | |
| 1998 | 68 | |
| 1999 | 76 |
| Value | Count | Frequency (%) |
| 1999 | 76 | |
| 1998 | 68 | |
| 1997 | 70 | |
| 1996 | 60 | |
| 1995 | 51 | |
| 1994 | 43 | |
| 1993 | 36 | |
| 1992 | 31 | |
| 1991 | 34 | |
| 1990 | 31 |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 128721634.6 |
| Minimum | 71368 |
|---|---|
| Maximum | 1845034240 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.5 KiB |
Quantile statistics
| Minimum | 71368 |
|---|---|
| 5-th percentile | 7400646 |
| Q1 | 22594055.5 |
| median | 75252928 |
| Q3 | 177995820 |
| 95-th percentile | 377427152 |
| Maximum | 1845034240 |
| Range | 1844962872 |
| Interquartile range (IQR) | 155401764.5 |
Descriptive statistics
| Standard deviation | 158997546.7 |
|---|---|
| Coefficient of variation (CV) | 1.235204534 |
| Kurtosis | 30.11120029 |
| Mean | 128721634.6 |
| Median Absolute Deviation (MAD) | 61024417.5 |
| Skewness | 3.926900804 |
| Sum | 6.436081731 × 1010 |
| Variance | 2.528021987 × 1016 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1845034240 | 1 | 0.2% |
| 378882400 | 1 | 0.2% |
| 171120336 | 1 | 0.2% |
| 222104688 | 1 | 0.2% |
| 61698900 | 1 | 0.2% |
| 448000000 | 1 | 0.2% |
| 285444608 | 1 | 0.2% |
| 553799552 | 1 | 0.2% |
| 361832384 | 1 | 0.2% |
| 300135360 | 1 | 0.2% |
| Other values (490) | 490 |
| Value | Count | Frequency (%) |
| 71368 | 1 | |
| 305070 | 1 | |
| 635096 | 1 | |
| 777423 | 1 | |
| 791830 | 1 | |
| 1060056 | 1 | |
| 1345903 | 1 | |
| 1531251 | 1 | |
| 1614266 | 1 | |
| 2075084 | 1 |
| Value | Count | Frequency (%) |
| 1845034240 | 1 | |
| 924317568 | 1 | |
| 920099968 | 1 | |
| 816969280 | 1 | |
| 788241792 | 1 | |
| 677945408 | 1 | |
| 672806272 | 1 | |
| 589390528 | 1 | |
| 553799552 | 1 | |
| 520000000 | 1 |
| Distinct | 43 |
|---|---|
| Distinct (%) | 8.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.256398437 |
| Minimum | 4.19921875 |
|---|---|
| Maximum | 8.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.1 KiB |
Quantile statistics
| Minimum | 4.19921875 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 5.69921875 |
| median | 6.30078125 |
| Q3 | 6.80078125 |
| 95-th percentile | 7.6015625 |
| Maximum | 8.5 |
| Range | 4.30078125 |
| Interquartile range (IQR) | 1.1015625 |
Descriptive statistics
| Standard deviation | 0.8041992188 |
|---|---|
| Coefficient of variation (CV) | 0.1285402819 |
| Kurtosis | -0.01870727539 |
| Mean | 6.256398437 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | -0.002069473267 |
| Sum | 3128.199219 |
| Variance | 0.6469726562 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=43)
| Value | Count | Frequency (%) |
| 6.1015625 | 30 | 6.0% |
| 6.3984375 | 29 | 5.8% |
| 6.5 | 28 | 5.6% |
| 6.19921875 | 27 | 5.4% |
| 6.30078125 | 25 | 5.0% |
| 6.69921875 | 25 | 5.0% |
| 5.8984375 | 23 | 4.6% |
| 6.80078125 | 21 | 4.2% |
| 5.80078125 | 20 | 4.0% |
| 6 | 20 | 4.0% |
| Other values (33) | 252 |
| Value | Count | Frequency (%) |
| 4.19921875 | 3 | 0.6% |
| 4.30078125 | 1 | 0.2% |
| 4.3984375 | 4 | 0.8% |
| 4.5 | 5 | |
| 4.6015625 | 2 | 0.4% |
| 4.69921875 | 3 | 0.6% |
| 4.80078125 | 3 | 0.6% |
| 4.8984375 | 3 | 0.6% |
| 5 | 10 | |
| 5.1015625 | 11 |
| Value | Count | Frequency (%) |
| 8.5 | 1 | 0.2% |
| 8.296875 | 3 | |
| 8.203125 | 5 | |
| 8.1015625 | 1 | 0.2% |
| 8 | 1 | 0.2% |
| 7.8984375 | 2 | 0.4% |
| 7.80078125 | 2 | 0.4% |
| 7.69921875 | 7 | |
| 7.6015625 | 6 | |
| 7.5 | 7 |
| Distinct | 408 |
|---|---|
| Distinct (%) | 81.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 762.288 |
| Minimum | 11 |
|---|---|
| Maximum | 9680 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 4.5 KiB |
Quantile statistics
| Minimum | 11 |
|---|---|
| 5-th percentile | 32.95 |
| Q1 | 140 |
| median | 361.5 |
| Q3 | 839.25 |
| 95-th percentile | 3124.6 |
| Maximum | 9680 |
| Range | 9669 |
| Interquartile range (IQR) | 699.25 |
Descriptive statistics
| Standard deviation | 1224.90015 |
|---|---|
| Coefficient of variation (CV) | 1.60687319 |
| Kurtosis | 18.7057271 |
| Mean | 762.288 |
| Median Absolute Deviation (MAD) | 270.5 |
| Skewness | 3.859023631 |
| Sum | 381144 |
| Variance | 1500380.378 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 49 | 4 | 0.8% |
| 128 | 4 | 0.8% |
| 86 | 3 | 0.6% |
| 522 | 3 | 0.6% |
| 140 | 3 | 0.6% |
| 52 | 3 | 0.6% |
| 381 | 3 | 0.6% |
| 20 | 3 | 0.6% |
| 26 | 3 | 0.6% |
| 300 | 3 | 0.6% |
| Other values (398) | 468 |
| Value | Count | Frequency (%) |
| 11 | 1 | 0.2% |
| 12 | 1 | 0.2% |
| 13 | 2 | |
| 17 | 3 | |
| 20 | 3 | |
| 21 | 1 | 0.2% |
| 23 | 1 | 0.2% |
| 24 | 2 | |
| 25 | 2 | |
| 26 | 3 |
| Value | Count | Frequency (%) |
| 9680 | 1 | |
| 9080 | 1 | |
| 8360 | 1 | |
| 8148 | 1 | |
| 7768 | 1 | |
| 5916 | 1 | |
| 5520 | 1 | |
| 5416 | 1 | |
| 5148 | 1 | |
| 4956 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | budget | original_title | popularity | release_year | revenue | vote_average | vote_count | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1639 | 200000000 | Titanic | 26.889070 | 1997 | 1845034240 | 7.500000 | 7768 |
| 1 | 205 | 175000000 | Waterworld | 16.885184 | 1995 | 264218224 | 5.898438 | 1017 |
| 2 | 2586 | 170000000 | Wild Wild West | 9.887602 | 1999 | 222104688 | 5.101562 | 1042 |
| 3 | 2711 | 160000000 | The 13th Warrior | 10.308026 | 1999 | 61698900 | 6.398438 | 524 |
| 4 | 2572 | 150000000 | Tarzan | 12.453452 | 1999 | 448000000 | 7.101562 | 1715 |
| 5 | 1809 | 140000000 | Lethal Weapon 4 | 14.470551 | 1998 | 285444608 | 6.300781 | 782 |
| 6 | 1808 | 140000000 | Armageddon | 13.235112 | 1998 | 553799552 | 6.500000 | 2540 |
| 7 | 2965 | 135000000 | The World Is Not Enough | 12.130127 | 1999 | 361832384 | 6.000000 | 878 |
| 8 | 3040 | 133000000 | Stuart Little | 8.359500 | 1999 | 300135360 | 5.800781 | 998 |
| 9 | 1773 | 130000000 | Godzilla | 11.295121 | 1998 | 379014304 | 5.300781 | 1075 |
Last rows
| df_index | budget | original_title | popularity | release_year | revenue | vote_average | vote_count | |
|---|---|---|---|---|---|---|---|---|
| 490 | 3378 | 20000000 | Misery | 15.020845 | 1990 | 61276872 | 7.601562 | 1085 |
| 491 | 450 | 20000000 | Free Willy | 5.990572 | 1993 | 153698624 | 6.000000 | 429 |
| 492 | 3435 | 20000000 | Jennifer Eight | 4.620936 | 1992 | 11390479 | 5.800781 | 74 |
| 493 | 3135 | 20000000 | Wayne's World | 10.180776 | 1992 | 121697320 | 6.500000 | 738 |
| 494 | 8278 | 19885552 | Fire in the Sky | 6.118228 | 1993 | 19724334 | 6.500000 | 128 |
| 495 | 69 | 19000000 | From Dusk Till Dawn | 15.339153 | 1996 | 25836616 | 6.898438 | 1644 |
| 496 | 6661 | 19000000 | Sleeping with the Enemy | 8.560694 | 1991 | 174999008 | 6.101562 | 228 |
| 497 | 142 | 19000000 | Bad Boys | 9.262184 | 1995 | 141407024 | 6.500000 | 1729 |
| 498 | 1521 | 19000000 | Picture Perfect | 4.347079 | 1997 | 44332016 | 5.101562 | 114 |
| 499 | 7200 | 19000000 | Clifford | 2.459293 | 1994 | 7411659 | 5.300781 | 17 |